SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

نویسندگان

Soroush Mehri

Kundan Kumar

Ishaan Gulrajani

Rithesh Kumar

Shubham Jain

Jose Sotelo

Aaron C. Courville

Yoshua Bengio

چکیده

In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time. We show that our model, which profits from combining memory-less modules, namely autoregressive multilayer perceptrons, and stateful recurrent neural networks in a hierarchical structure is able to capture underlying sources of variations in the temporal sequences over very long time spans, on three datasets of different nature. Human evaluation on the generated samples indicate that our model is preferred over competing models. We also show how each component of the model contributes to the exhibited performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Char2wav: End-to-end Speech Synthesis

We present Char2Wav, an end-to-end model for speech synthesis. Char2Wav has two components: a reader and a neural vocoder. The reader is an encoderdecoder model with attention. The encoder is a bidirectional recurrent neural network that accepts text or phonemes as inputs, while the decoder is a recurrent neural network (RNN) with attention that produces vocoder acoustic features. Neural vocode...

متن کامل

Utilizing Domain Knowledge in End-to-End Audio Processing

End-to-end neural network based approaches to audio modelling are generally outperformed by models trained on high-level data representations. In this paper we present preliminary work that shows the feasibility of training the first layers of a deep convolutional neural network (CNN) model to learn the commonlyused log-scaled mel-spectrogram transformation. Secondly, we demonstrate that upon i...

متن کامل

DeepSynth: Synthesizing A Musical Instrument With Video

This paper introduces DeepSynth, an end-to-end neural network model for generating the sound of a musical instrument based on a silent video of it being played. We specifically focus on building a synthesizer for the piano, but the ideas proposed in this paper are applicable to a wide range of musical instruments. At a high level, the model consists of a convolutional neural network (CNN) to ex...

متن کامل

SampleCNN: End-to-End Deep Convolutional Neural Networks Using Very Small Filters for Music Classification

Convolutional Neural Networks (CNN) have been applied to diverse machine learning tasks for different modalities of raw data in an end-to-end fashion. In the audio domain, a raw waveform-based approach has been explored to directly learn hierarchical characteristics of audio. However, the majority of previous studies have limited their model capacity by taking a frame-level structure similar to...

متن کامل

Conditional End-to-End Audio Transforms

We present an end-to-end method for transforming audio from one style to another. For the case of speech, by conditioning on speaker identities, we can train a single model to transform words spoken by multiple people into multiple target voices. For the case of music, we can specify musical instruments and achieve the same result. Architecturally, our method is a fullydifferentiable sequence-t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1612.07837 شماره

صفحات -

تاریخ انتشار 2016

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

نویسندگان

چکیده

منابع مشابه

Char2wav: End-to-end Speech Synthesis

Utilizing Domain Knowledge in End-to-End Audio Processing

DeepSynth: Synthesizing A Musical Instrument With Video

SampleCNN: End-to-End Deep Convolutional Neural Networks Using Very Small Filters for Music Classification

Conditional End-to-End Audio Transforms

عنوان ژورنال:

اشتراک گذاری